treeKL: A distance between high dimension empirical distributions
نویسندگان
چکیده
This paper offers a methodological contribution for computing the distance between two empirical distributions in an Euclidean space of very large dimension. We propose to use decision trees instead of relying on standard quantifi10 cation of the feature space. Our contribution is two-fold: We first define a new distance between empirical distributions, based on the Kullback-Leibler (KL) divergence between the distributions over the leaves of decision trees built for the two empirical distributions. Then, we propose a new procedure to build these unsupervised trees efficiently. 15 The performance of this new metric is illustrated on image clustering and neuron classification. Results show that the tree-based method outperforms standard methods based on standard bag-of-features procedures.
منابع مشابه
On the Mahalanobis-distance based penalized empirical likelihood method in high dimensions
In this paper, we consider the penalized empirical likelihood (PEL) method of Bartolucci (2007) for inference on the population mean which is a modification of the standard empirical likelihood and employs a penalty based on the Mahalanobis-distance. We derive the asymptotic distributions of the PEL ratio statistic when the dimension of the observations increases with the sample size. Finite sa...
متن کاملEmpirical investigation of tourists' perceived psychic distance of Iran as a tourism destination
The aim of the current study was to investigate the perceived psychic distance of potentialtourists in relation to Iran as a tourism destination. The concept of psychic distance refersto perceived similarities/ differences between specific destination and tourist's homecountry. The members of couch-surfing virtual community participated in this study. Thestatistical data were collected by conve...
متن کاملTesting for Equal Distributions in High Dimension
We propose a new nonparametric test for equality of two or more multivariate distributions based on Euclidean distance between sample elements. Several consistent tests for comparing multivariate distributions can be developed from the underlying theoretical results. The test procedure for the multisample problem is developed and applied for testing the composite hypothesis of equal distributio...
متن کاملWasserstein Distance Measure Machines
This paper presents a distance-based discriminative framework for learning with probability distributions. Instead of using kernel mean embeddings or generalized radial basis kernels, we introduce embeddings based on dissimilarity of distributions to some reference distributions denoted as templates. Our framework extends the theory of similarity of Balcan et al. (2008) to the population distri...
متن کاملRelations between Renyi Distance and Fisher Information
In this paper, we first show that Renyi distance between any member of a parametric family and its perturbations, is proportional to its Fisher information. We, then, prove some relations between the Renyi distance of two distributions and the Fisher information of their exponentially twisted family of densities. Finally, we show that the partial ordering of families induced by Renyi dis...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition Letters
دوره 34 شماره
صفحات -
تاریخ انتشار 2013